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Abstract 

Proteins are regularly described with some general indices (mass fractal dimension, surface fractal dimension, 
entropy, enthalpy, free energies, hydrophobicity, denaturation temperature etc.), which are inherently statistical 
in nature. These general indices emerge from innumerable (innately context-dependent and time-dependent) 
interactions between various atoms of a protein. Many a studies have been performed on the nature of these 
interatomic interactions and the change of profile of atomic fluctuations that they cause. However, we still do 
not know, under a given context, for a given duration of time, how does a macroscopic biophysical property 
emerge from the cumulative interatomic interactions. An exact answer to that question will involve bridging 
the gap between nano-scale distinguishable atomic description and macroscopic indistinguishable (statistical) 
measures, along the mesoscopic scale of observation. In this work we propose a computationally implementable 
mathematical model that derives expressions for observability of emergence of a macroscopic biophysical prop- 
erty from a set of interacting (fluctuating) atoms. Since most of the aforementioned interactions are non-linear in 
nature; observability criteria are derived for both linear and the non-linear descriptions of protein interior. The 
study assumes paramount importance in 21 st -century biology, from both the theoretical and practical utilitarian 
point of view. While it helps the theoretical discourse by providing a framework to understand the origin of 
a macroscopic property; ability of it to predict a priory whether the dynamics in a certain set of atoms or the 
couplings between them, can at all produce a biological property of interest or not, will account for tremendous 
saving of resource and effort. 
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1 Introduction 



Recent works have described proteins as 'complex systems'[l, 2] and as 'deformable polymers'[3]. The mesoscopic 
nature of protein structures has been reported by crystallographers too [4]. Furthermore, it has been found recently 
that proteins exist in a state of 'self organized criticality'[5, 6]. Along with all these, recent [7] and previous [8] 
characterizations of inhomogeneous distributions of mass and hydrophobicity merely serve to complicate an effort 
to construct a general and unambiguous scheme for description of protein interior. An approach to study protein 
interior that describes the inhomogeneous, nonlinear behaviors of protein structural parameters can be constructed 
by describing them through self similarity. Indeed many previous studies on this topic (a dreadfully undersized 
representation is references [9], [10-18]) had hinted that with an objective quantification of self-similarity, we can 
decipher the hidden symmetry, which connects global patterns of macroscopic properties in proteins (say hydropho- 
bicity distribution, polarizability distribution etc..) with the local (atomic) interactions that produce them [19]. 

However the basic question, that, precisely when does the macroscopically measurable quantities emerge from 
the microscopic interactions between the atoms, - could not be answered from any of these approaches. Such 
an examination of protein interior is necessary not only for purely theoretical discourse, but also for emerging 
practical applications that attempt to describe proteins from paradigms of nanotechnology and nano-science and 
mesoscopic-science. Protein function is a dynamic property that comes into being due to conformational changes 
of protein structure in its physiological environment. Hence, as have been commented upon in a recent study[20], 
to understand and control the function of target proteins, it assumes paramount importance to develop methods 
that can analyze the collective motions at the molecular level, from which the macroscopic properties emerge. This 
question is therefore of immense importance to contemporary protein designing and protein engineering studies. 
The aforementioned question can be alternatively posed from the perspective of control systems study; viz. when 
does a particular biophysical property become observable in a sub-set of protein atoms? Re-framed equivalently 
: whether we can call a sub-system of protein atoms to be observable with respect to a particular biophysical 
property or not? - This precise question can obviously be answered in three ways; viz. the sub-system of atoms 
under question, is completely observable with respect to the biophysical property under consideration; it is partially 
observable, or it is not observable at all. Derivations of the precise mathematical conditions for these three cases, 
form the focus of the present work. Algorithmic implementations of these mathematical conditions can easily be 
achieved to relate the general theoretical framework to particular simulation oriented studies that can be applied 
directly to the practical situations. 

Since a protein is comprised of large number of atoms, a realistic scheme that attempts to describe any global 
biophysical property (say free energy of a protein, radius of gyration of it, hydrophobic fractal dimension of it, 
resultant dipole moment of it, etc ..) from the study of relevant property associated with individual atoms (say, 
individual spatial fluctuation of each atom, mass and volume of an atom, partial charge in each of the atoms, etc 
..) becomes difficult to construct, analyze and understand. The need therefore, is to construct a method that is 
mathematically accurate, but at the same time, easily implementable algorithmically. A method that can reduce the 
shear scale of dimensionality (huge number of atoms, many properties, etc ..) associated with the problem. Such a 
method can only be realized, if it can detect commonalities in patterns across different biophysical properties. Since 
the properties under consideration are all macroscopic in nature, the problem becomes especially difficult to pose 
if studies with lower number of atoms than the permitted threshold for emergence of the property are attempted. 
However, although complicated, construction of this algorithm holds enormous importance for several paradigms of 
protein biophysics. A general mathematical construct to address this problem should be able provide a simple yet 
reliable framework to describe and analyze the connection between individual properties at the atomic scale and 
globally emergent protein scale. In this work, we propose such an algorithm. Indeed over the years some attempts 
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have been made in the paradigm of protein biophysics, to establish the relationship between coupled microscopic 
fluctuations and their effect on causing the macroscopic behaviour [21, 22]; but scope of these efforts were limited 
to certain specialized fields and were not general. The present work, on the other hand attempts to construct the 
generalized conditions to achieve the same. 

Since the number of works that have attempted to view proteins as mesoscopic systems are less, it assumes im- 
portance at this point to clarify the exact goal of the present paper. Macroscopic states describe a system from 
a top-down perspective with variables of (mostly) statistical origin. In other words, macroscopic description of 
the system contains at lower level of information than the same with microscopic description. However, since the 
measurable variables themselves are macroscopic, a macroscopic description still meets the given requirements of 
accuracy. In the realm of protein biophysics, the consideration of macroscopic states like temperature, pressure, 
enthalpy or entropy is sufficient to describe the behavior of the system and knowledge of microscopic states like 
position and velocity of each of the involved molecules is not always necessary. The transition from microscopic level 
of description to the macroscopic one, however, is not continuous and if we merely consider these two modes of de- 
scription of a system (microscopic and macroscopic) , the transition takes place in a step-function like discontinuous 
mode at the statistical parametric limit (number of components of the system = 32). - Such a scheme of description 
of biophysical properties might not always be correct. Properties do not emerge suddenly, but appropriates special 
features and characteristics gradually as the description turns gradually from microscopic to macroscopic paradigm. 
Mesoscopic states are states containing the intermediate details. It is at this particular scale that we can expect to 
observe the origin and gradual coming to being of (most of the) biophysical properties. Hence, it assumes enormous 
importance to construct objective frameworks compatible to the mesoscopic scale of protein description, so that 
one would be able to scrutinize the multifaceted characteristics of the origin and development of the biophysical 
property of his/her interest. 

The basic and only assumption of the present work is that the emergence of macroscopic properties can be studied 
with accuracy and consistency, by studying the interatomic interaction profiles in sufficient details. Assertions from 
some recent works [23-25] support this assumption strongly. Interatomic interactions manifest themselves through 
their fluctuation profiles. Interatomic fluctuations are so important to protein's existence that any instantaneous 
conformation of its, is known to fluctuate thermally around its native conformation. In the interior of the protein, 
atoms are tightly packed and the interactions between the atoms assume complicated nature, but the fluctuations 
prevail there too. These fluctuations have been studied from various perspectives by various works. The ther- 
mal, conformational fluctuations of a globular protein were decomposed into collective motions and studied from 
various perspectives [26-30]. In normal modes analysis, the fluctuations are expressed by a linear combination of 
normal modes [28-30]. However, construction of a method to observe the emergence of a macroscopically measured 
property from the microscopic fluctuations through a mesoscopic limit, had never been tried before. The present 
algorithm attempts to trace back any macroscopically measured property of statistical origin to time-dependent 
and context-dependent microscopic fluctuations, to observe at which mesoscopic limit of number of atoms does the 
property emerge and how it grows gradually, before attaining its macroscopic statistical nature. 

This task is daunting because the probability and structural feature of the entire spectrum of microstates sampled 
by proteins, is not clearly known [31, 32]. The sensitivity of the ensemble of microstates to changes in environmental 
conditions (i.e., pH, temperature, pressure, ligand binding, and concentrations of osmolytes and denaturants) is 
also not well understood either [32]. Most importantly, the manner in which local fluctuations are coupled to larger, 
more global structural transitions - isn't known either. Hence a mathematical construct that attempts to model 
the situation must essentially be top-down in its approach (to circumvent tackling the time dependent couplings 
between each and every local fluctuations), yet extract the necessary information regarding the emergence of any 
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biophysical property. Rather than assuming the aforementioned interatomic interactions to be linear, we have re- 
sorted to model the situation from a nonlinear perspective. Reasons behind this assertion and the relevance of it 
in the present study are numerous. Protein fold in the crowded milieu of the cell, where the density of protein is 
~ (0.2 — 0.3) g/mL, Hence the huge number of intramolecular interatomic interactions (formation, breakage and 
reformation of hydrogen bonds, salt bridges etc ..) take place in competition with similar intermolecular interatomic 
interactions; which, of course, may be detrimental to the folding process [33]. Despite it, the process usually yields 
the native state in a matter of milliseconds to seconds. This inherent swiftness of the process implies a series of 
ordered events or intermediates, a fact that has intrigued researchers for several decades [34], [35]. However, the 
order in which these intramolecular events take place are not known in general terms, even today. Evolution of a 
biophysical property from its mesoscopic limit to macroscopic limit (statistical parametric limit of 32 atoms) can be 
studied from the implementation of the present (general) mathematical framework. Careful (computational) imple- 
mentation of such mathematical framework can (possibly) resolve the contradictions in protein folding/unfolding 
mechanisms (some models to describe denatured state banks either on the sum of individual amino acids [36,37] or 
on an extended, completely solvent-exposed polypeptide chain [38,39]. This assumption is at odds with experimental 
evidence showing that the denatured state in the absence of denaturants is rather compact [40,41]), as elaborated in 
earlier [23] study). Thus, the present study is not merely about theoretical pursuit but has several practical uses too. 



2 Methodology : 

Application of control theoretic constructs to study biological systems is not exactly common, but presence of such 
rigorous mathematical constructs are found in many recent works. All of these works, in some form or the other, 
demonstrate that reconstruction of some specific regulated states under conditions of limited information - can be 
achieved extremely efficiently through the control theoretic constructs. While many of these (pioneering) deduc- 
tions are applicable to paradigms in systems biology [42-44], isolated instances of insightful treatment of genetic 
regulation can be found too [45]. Consideration of the role of observer [46, 47] in an essentially nonlinear paradigm 
of systemic description of biological systems is successfully achieved in the later. However, control theoretic studies 
on protein biophysical factors with similar rigorous standpoint, were not found. Several works on control theoretic 
constructs attempt to model the systems from a (time- invariant) linear perspective [48, 49]. Since such formalism 
is not relevant in the attempts to describe biological systems, the treatment of the same in the present work is 
attempted from time-dependent perspective, completely. On the other hand, since we are attempting to observe 
the emergence of protein biophysical features (in a mesoscopic scale), the state-space oriented control studies (as 
have been attempted in some biological paradigms [50, 51]) were not explicitly touched in the present study. We 
note that possibility of application of control theory in the context of protein structure prediction through NMR 
was discussed in a previous article [52], the actual conditions of observability of the emergence of a (statistical) 
macroscopic property was not obtained there. 

While the present study owes its philosophical basis to the aforementioned studies (and many others [53, 54]) 
it differs from all of the above; because, to our knowledge, this is the first attempt to propose a theoretical frame- 
work that attempts to observe how the measurable macroscopic biophysical properties of proteins come to being 
from (microscopic) time-dependent and context-dependent interatomic interactions. 

Having established the reason behind such studies, here we embark on derivation of the algorithm. This is done 
in two parts. In the first section, the definition of protein from the perspective of control theory is put into place. 
The next section then approaches the problem in a step-by-step manner to deduce the conditions that will un- 
ambiguously define whether any (sub) set of atoms of a protein is completely, or partially or not observable with 
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respect to a biophysical property. Calculations for the (approximated) linear case, are also kept here; because in 
certain particular cases the computational implementation of the non- linear case may become difficult. The actual 
description of the process, however, can only be found from derivations of the non-linear case. 

Section - 1) : Definition of the system(a single protein) 

Case-1) General representation of protein interior parameters with linear differential equation. 

We approach to objectively describe these time-dependent and context-dependent correlations amongst protein 
structural parameters by representing any arbitrarily chosen protein with a linear differential equation with suf- 
ficient capacity to describe the (time-dependent) dynamic-dependencies of protein structural parameters on one 
another. Such an approach provides us with a computationally implementable simple framework with adequate 
rigor, given by : 

x(t)=A(t)x(t) + f(t) (1) 

where, x is a n — vector, A (t) is an n x n continuous matrix on an open interval / in R, and / (t) is locally 
square integrable on some arbitrarily chosen interval (a, b), viz. f (t) E l? n ([a, b\). In other words, the space 
of all measurable n — vector functions f(t) defined for t E [a, b] — J with values f (t) E R n , t E J such that 
! b a \f{t)\ 2 dt <oo. 

We can write eq n — 1, in the following equivalent form as : 

x{t) = x a + f [A(s)x(s) + f(s)]ds (2) 

Subsequently we can define the successive approximations by the relations : 

x (t) = x 

and 

x n +i(t) = x + J fo [A(s)x n (s) + f{s)]ds, t E J, n = 0,1,2,... 
The solution of eq n — 2 with x (to) — xo is given by : 

x (t) = X (t, t ) x + [ X (t, s) f (s) da, (3) 

where X (t, to) is the fundamental matrix solution of homogeneous equation, x (t) = A (t) x (t) , 
which has the following properties : 

1) X (to, to) = I (the identity matrix). 

2) X(t, to) =X(t,s)X(s,t ), h<s<t 

3) X (t, s) = X- 1 (s, t) 

Case-2) General representation of protein interior parameters with non-linear differential equation. 

Of course, a representation scheme similar to eq n — 3 can be obtained for a non-linear differential equation of the 
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form 



x(t)=A(t)x{t) + f(t, x) (4) 
where / (t, x) is continuous on J x R n . For eq n — 4, the solution of x (t) with x (to) — xq can be written as : 

x (t) = X (t, t ) x + f X(t,s)f(s,x(s))ds, (5) 

Jtg 

where X (i, to) is the fundamental matrix solution of homogeneous equation. With this non-linear representation, 
complexities of solution increases undoubtedly, but under suitable conditions on A and / , one can establish the 
existence of solution of the non-linear equation (4) . 

Section - 2) : Criteria of observability of protein interior parameter dynamics. 

Based on the definition of the system (that is, an arbitrarily chosen protein) as provided above we now proceed to 
derive the conditions for observability of emergence of any biophysical property from any arbitrarily chosen (sub) set 
of atoms belonging to the protein. 

Case-1) : 

Observability, under the assumption that proteins are linear systems : 

We continue with our description of interactions of various structural parameters in protein interior and the emerg- 
ing property with the differential equation, x (t) = A(t) x + f(t) (all the symbols retain their meaning from eq n — 1) 

Here, in section-2, we attempt to approach the description of the process when eq n — 1 is subjected to a lin- 
ear observation process, described with simple form, viz: 

y =0(t)x + 6(t)f, y e R n (6) 

Assuming that the interaction of various structural parameters was taking place in a time interval [to, t{\ C (a, b) 
and x (to) = xo 6 R n , we had arrived at eq n — 3. We start the derivation necessary to describe observability of 
the protein with aforementioned structural parameters, by noticing first admitting that observation of the relevant 
phenomenon under question itself is a time-dependent process; and second, if / is known function, for example 
f (t) = B (t) u (t) with u (t) being a control then, in principle the term O (t) f in eq n — 6 and O (t) times the integral 
of eq n — 3 can be subtracted from : 

y(t) = (t) X (t, t ) xo +0 (t) Jl X (t, s) f (s) ds+ 6 (t) f (t) 

to yield the modified closed form expression for observation given by : 

y (t) = O (t) X (t, t ) xo (7) 

The term X (t, to) in eq n — 7 satisfies the homogeneous equation 

x = A (t) x (8) 
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Therefore the expression to represent linear observation on a protein whose interior structural dependencies can be 
described by a linear differential equation, is obtained as : 



y(t) = 0(t)x(t) (9) 

Hence, the original question about obtaining information about a system described by eq n — 1 with the help of an 
observation scheme described by eg" — 6, reduces to the same question for the corresponding homogeneous system, 
described by eq n — 8 and the homogeneous observation described by eq n — 9. This transformation between paradigm 
of questions mark the change in modes of studies because the present platform (comprised of eq n — 8 and eq n — 9) 
offers us a homogeneity in the treatment of the problem in general; something that wasn't ensured in the platform 
comprised of eq n — 1 and eq n — 6. 

However, to make meaningful predictive studies, the present framework needs to be modified further regarding 
a suitable description scheme to describe temporal frame of reference. Thus, without any loss of generality, we 
perform the translation of the the origin, so that r = t — to- This accounts for the limits to — ► , whereby 
h — » (ti - to) = T. 



To formalize the problem we define observability in the manner that, the system represented by eq n — 8 and 
eq n — 9 is observable (that is, the pair (O (t) , A (t)) is observable) on the time interval [0, T] 
iff y(t)=0 (t) x(t) = 0, te [0, T] implies x(t) = 0, te [0, T] . 
(which is equivalent to the assertion x (0) = x = 0). 

Thus, the re-defined version of the problem is to identify (and/or develop) the conditions for observability on 
the matrices A (t) and O (t). 

We approach the problem by denoting the space of square integrable r — vector functions on [0,T] by l? r [0, T]. 

At this point we propose theorem- 1, theorem of 'connection between independence of protein structural parameter 
and their corresponding observability'. 

Theorem- 1) : 

If the structural parameters corresponding to any protein can be represented by vectors x\, X2, ■ ■ ■ , Xk in finite 
dimensional Euclidean space R n , and if x\ (t) , x<i (t) , . . . , Xk (t) be the corresponding solutions of eq n — 8 for them 
in [0, T] with x (0) = Xi , i = 1, 2, . . . , k ; further if the corresponding observations yi on [0, T] can be defined by 
yi (t) = O (t) Xi (t) , t <E [0, T]\ then the observed linear system described by eq n — 8 and eq n — 9 is observable on 
[0,T]; if and only if, y% are linearly independent in L? r [0,T] whenever the x% are linearly independent in the same 
finite dimensional Euclidean space R n . 

Proof : 

The solutions Xi (t) are linearly independent in L% [0,T] only in the case when x% are linearly independent in R n . 
If eq n — 8 and eq n — 9 is observable and 
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k 

y(t) = J2 c >y*^ = ° ( 10 ) 

then the corresponding solution also vanishes. In other words, it implies : 

k 

x(t)=J2c l x l (t)=0 (11) 

i=l 

and in particular 

k k 

c iXi (0) = = (12) 

i=l i=l 

In the case where a;»'s are linearly independent, we have ci = C2 = . . . = Ck = 0. Hence, from eq n — 10, we can 
conclude that in such a case j/j's will be linearly independent too. 

On the other hand (evidently, in the more general case) if there exists linearly independent x\ , X2 , ■ ■ ■ , Xk such that 
the associated observations of them, namely y\ (i) , y<i (t) , . . . , yk (t) are not independent (that is, are dependent 
on L"l [0,T]), then letting c\ = C2 = ■ ■ ■ = Ck are not all zero, such that 

we notice that y (t) becomes an identically vanishing observation on the solution x (t) = $Z i=1 c i x i (t) , which is 
not the zero solution of eq n — 8, because in such a case, x\ = x\ (0) , Xi = x<i (0) , . . . , Xk = Xk (0) are linearly 
independent. 

Hence, in such a case, we can conclude that eq n — 8 and eq n — 9 will not be observable. Q.E.D 

Proof of theorem- 1 ('connection between independence of protein structural parameter and their corresponding 
observability') paves the way for a more general theorem, the theorem-2, 'observability of protein structural param- 
eters as components of a linear system' 



Theorem-2) : 

The system (protein), described by eq n — 8 and eq n — 9, is observable on time interval [0, T] iff the observability 

Grammian matrix, given by : 

$ (0, T) = J T X* (i, 0) O* (t) O (t) X (t, 0) dt 

(where X* (t, 0) and O* (t) are the transposes of X (t, 0) and O (i)) is positive definite. 
Proof : 

The solution of x (t) of eq n — 8 corresponding to the initial condition x (0) = xq is given by : 
x(t)=X (t, 0) x 

and we obtain y(t) = (t) x(t)=0 (t) X (t, 0) x 
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f y*(t)y(t)dt 



Jo 



= Xq f X (t, 0) O* (t) O (t) X (t, 0) dt 



= x* $(0,T)x 



- a quadratic form in xq. 

Clearly $ (0,T) is a symmetric n x n matrix. 

If $ (0, T) is positive definite then : 

y = => x* a $ (0, T) x a = x = 

and then the system described by eq n — 8 and eq n — 9 is observable on [0, T]. If, $(0,T) is not positive defi- 
nite then it implies that there exists some xo ^ such that Xq $ (0, T) xq = 0. 
In such a case, x(t) = X (t, 0) xo ^ for t G [0, T] but since || y || 2 = , it implies y = 0. 
And therefore, we can conclude that system described by eg" — 8 and eq n — 9 is not observable on [0,T]. 
Q.E.D 

Corollary : 

If the system described by eq n — 8 and eq n — 9 is observable on [s, t] then it is also observable on any interval [0, T] 
such that < s < t < T. 
Proof of corollary : 
We have : 



Observability, under the assumption that proteins are non-linear systems : 

In certain cases the dependencies amongst structural parameters within any protein might not be governed by 
equations with simple linear dependencies. This is probable too, considering that interactions amongst protein 
structural determinants are time-dependent and context-dependent. Hence, in such a case, without resorting to the 
linear (simplistic and approximated) case, we will have to describe proteins as non-linear systems. Here, instead of 
referring to eq n — 1, we start our descriptions with eq n — 4, viz : x (t) = A(t) x{t) + f(t, x), 




> X*(s,0)$(s,t)X(s,0) 

> 



Hence the proof. 



Case-2) : 
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where x and / are n — vectors, t € I , the real time interval with linear observation y = O (t)x (t) , where y is an 
m — vector (m < n) and A (t), f (i, x) , O (t) - are continuous with respect to their arguments. 

Here, although admitting that case of non-linear representation scheme is more general (and perhaps, more ac- 
curate), we resort mathematically to describe the case as a linear system (eq n — 4) with perturbation / (t, x). We 
assume further that the it is feasible to describe the situation as one where (eg™ — 4) is being observed by a quantity 
y. In such a framework of description, the problem of observability of (eq n — 4) can be formulated as one, where : 
it is required to find an unknown state at the present time t from the quantity y, over the interval [9, t] where 9 
denotes some past time, because, since (m < n) , the equation y = O (t) x (t) does not allow immediate finding of 
x and y. 

At this point, having loosely describing the framework to describe the situation, we proceed to formally define 
the system as : 

NL-Defn-1) The system can be defined to be observable at time t, if there exists 9 < t such that the state of the 
system at time t, can be identified from the knowledge of the system output over the interval [9,t]. 
NL-Defn-2) If the system is observable at every t £ I , it can be called 'completely observable'. 
NL-Defn-3) If the interval of output observation mentioned in NL-Defn-1, can be made arbitrarily small, we speak 
of differential observability over those intervals. 

We start our analysis of non-linear description of protein interior by assuming that (eq n — 4) has a unique so- 
lution for any initial condition. If we denote r as 9 < t < t, the solution for (eq n — 4) can be asserted to be 
uniquely defined for x = x (t) as the initial condition and (drawing from aforementioned non-linear description of 
proteins in case-2 of section-1) given by : 

x (t) = X (t, r) x (r) + £ X (t, s) f (s, x (s)) ds 

However, since the fundamental matrix is invertible in nature, we have : 

x(t) = X(r,t)x(t) - J X (t,s) f (s,x(s))ds (13) 

Correspondingly the y (r) will be given by : 

y(r) = (r) X (r, t) x(t) - O (t) J X (r, s) f (s, x (s)) ds (14) 

Describing the transpose of any matrix, with a star symbol, with a little bit of rearrangement (by multiplying 
eq n — 14 with X* (r, t) O* (r) from the left and integrating within the interval 9 to t), we obtain : 
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X*(T,t)0* (T)y(T)dT 



X (r, t) O* (r) O (r) X (r, t) drx (t) 

X(r,t)0* (r)O(r) / X (r, s) / (s, a; (s)) dsdr 

= $(9,t)x{t) 

f X*(a,t) f X* (t,s)0* {T)0(T)X(T,s)dTf{s,x(s))ds 
Je Je 



<S>(6,t)x{t) 



f X*(s,t)$(6,s)f(s,x(s))ds 
Je 



If the matrix <I> (9, t) is invertible, that is, for a truncated linear system, 

x = A (t) x, y (t) =0(t)x 

is observable; then from the last equation of the array of equations, 

( J* X* (r, i) O* (r) y (r) dr = $ (0, t ) x (t) - [/* X* {a, t) $ (6, s) f (s, x (a)) ds] ) 

we obtain : 

x (t) - $-i (9, t) J* X* (s, t) O* (s) y (s) ds + ^ (6, t) f e X* {a, t) $ (6, a) f (a, x (a)) ds 
If we assign : 

U 1 (t,e,8)=*- 1 {6,t) f*X*(a,t)0* (a) 
and 

U 2 {t,e,8)=*- 1 {6,t) J* X* (s, t) $ (0, a) 

then the following compact relation can be obtained : 

rt 



x 



(t)= / U 1 (t,0,s)y{a)da + [ U 2 {t,6,a) f (a,x{a))da 
Je Je 



(15) 



(16) 



Equation-16 represents the relation of the unknown state x with the observed output y over he interval [0,t\. 



In eq n — 16 the time 9 may not be necessarily fixed, and therefore 9 can be replaced by r. Upon carrying out 
this change, eq n — 16 can be substituted into eq n — 13, and we obtain : 



x{t)=X (t, t) J L7i {t, t, s) y (a) ds + X (r, t) J U 2 (t, r, a) f (s, x(a))da- J X (r, a) f (s.x (a)) ds (17) 



In compact form, 

x{t) = J U 3 (t,T,8)y{s)d8 + J U A (t,T,8)f(s,x{8))ds for(r<t) (18) 
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where U3 (t, t,s) = X (t, t) U\ (t, r, s) , 

and U 4 (t, t,s)=X (t, t) U 2 {t, t,s) ~ X (r, s) 

On the basis of the derivations above we can put forward a proposition as 
NL-Proposition-1) : 

Under the condition that the system (protein interior) is describable by the differential equations x (t) = A(t) x(t) + 
fit, x) and y(t)=0 (t) x, it is globally 

a) observable at any time instance t, 

b) completely observable, or 

c) differentially observable, if the following conditions hold : 

1) there exists a constant c > 0, such that det $ (t,0) > c . 

2) Eq n — 16 has a unique solution for any y, which is continuous on [9, t] a) for some 9 < t, in case of an observable 
system at time t, b) for all t and for some 9 < t, in the case of a completely observable system, or c) for all t and 
for all 9 < t, in the case of a differentially observable system 

A careful of reading of the description of the situation suggests that eq n — 16 in 'NL-proposition-1' can be re- 
placed by eq n — 17. In the case of such replacement the same results will be valid, but with some simple change 
of variables. However the question that whether eq n — 16 or eq n — 17 has a unique solution is difficult to evaluate. 
The difficulty arises because we notice that if the interval of integration in eq n — 16 or eq n — 17 is suitably changed, 
eq n — 16 or eq n — 17 may then be considered as a nonlinear operator equation on a continuous function space. Thus, 
we will have to resort to Banach's contraction mapping theorem to these nonlinear equations. 

Without venturing into the general case, we consider a special system described by : 



x = A (t) x + ef (t, x) 
y = 0(t)x 



(19) 
(20) 



where, e is a scaler positive constant. We assume that the following condition is satisfied : 



|| f(t, Xl )-f(t,x 2 ) || < K\\ Xl -x 2 ||, (K>0) 



(21) 



A general solution x (t) for eq n — 18 with x = x (r) as a formal initial condition is: 




(22) 



We create an analogue of eq n — 16 derivation from eq 
Therefore : 



,n 



— 13, by starting with eq n — 22. 



x 




Je Je 



12 



Substituting eq n — 22 on eg™ — 21, we obtain : 

x(t) = X^.t)^ 1 {t.O) [ X* (s,t)0* (s)y(s)ds 

Je 

+ eX{T,t)§- 1 (0,i) / X*(s,i)$(*,0)/(a,:r(a))dfl 
- ej^ X*{T,s)f(s,x(s))ds (0<r<t) 

We denote this condition as 'NL-cond-1'. 

Consequently, in order for the system described by eq n — 18 and eq n — 19, to be observable, it is sufficient that the 
inverse of $ (t, 9) exists, the solution of eq n — 23 exists and it is unique. 

At this point, if we assume that there exists solution of xi,X2 (xi ^ x 2 ) of eq n — 23 for a given y, then, using 
eq n — 20, we obtain : 

(\xi (r) - x 2 (r)) < eJ\X(T,s)\K\x 1 (a)-x 2 (a)\da 

+ e\X( T ,t)$- 1 {t,6)\ f \X*{s,t)${s,6)\K\x 1 {s)-x 2 {s)\ds 
< eh (t, 9) (t - t) II x x - x 2 || +ek 2 (t, 9) \\ x x - x 2 \\ (t - 9) 

where 

ki(t-6)= max \X (r, t) (t, 6) \ \X* (s, t) $ (s, 6) K 

6<T<S<t 

From this, there exists a k (t, 9) such that : 

II xi -x 2 \\< ek (t, 9) (t - 9) || xi - x 2 \\ (24) 

where k (t, 9) = fci (t, 9) + k 2 (t, 9) 

Hence, most importantly, if e satisfies the inequality : 

ek (t, 9) (t - 6) < 1 (25) 

it follows that x\ = x 2 on [6,i\. 

This contradiction leads to the next proposition for a sufficient condition for the observability of the system de- 
scribed by eq n — 18 and eq n — 19, since the condition, 'NL-cond-1' necessarily guarantees the existence of solutions 
of eq n - 23. 
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Hence, finally we have : 
NL-Proposition-2) : 

The system described by eg™ — 18 and eq n — 19, is globally a) observable at the instance t, b) completely observable 
or c) differentially observable, if the following conditions hold : 

1) there exists a constant c > 0, such that det <& (t,0) > c . 

2) a positive constant, e , satisfies 
e < l/k(t,6) (t-9) 

a) for some 9 < t, in case of an observable system at time t, 

b) for all t and for some 9 < t, in the case of a completely observable system, and 

c) for all t and for all 6 < t, in the case of a differentially observable system. 



3 Results and Discussion : 

(Applicability of the algorithm on four different spheres in 
Protein Biophysics) : 

The present algorithm proposes a rigorous and reliable template for a series of algorithms that can be constructed to 
study the emergence of various biophysical factors. However, the actual implementation of these ideas might require 
superlative computational facilities that are not easily available in contemporary scenario; though the possibility 
of using such facilities in near future seems genuine. Due to prohibitive computational cost, actual implementation 
of these algorithms could not be achieved in the present work. In the absence of obtained data, in this section we 
present exact schemes to study the emergence of actual biophysical properties from the mathematical discourse that 
has been presented. Out of innumerable possibilities of application of this algorithm, we chose four paradigms; on 
which, the present study can (tangibly) be enormously impacting. In each of these (extremely well-studied) spheres 
of protein biophysics, the applicability of the present algorithm is clearly mentioned, alongside the utilitarian ben- 
efits that application of this algorithm can provide it with. 



Applicability - 1) Case of Hydrophobicity : 

1.1) Scope of applicability of the present algorithm : 

Origin of hydrophobicity from a bottom-up approach has long been a subject of fascination and enormous debate 
amongst physicists and chemists for the last thirty years. A summary of this entire spectrum of views, debates 
and scopes of (possible) confusions regarding various ways of defining what hydrophobicity is, what solvation is, 
the origin of hydrophobicity, (possible) correlation between hydrophobicity and polarizability in different molecules 
under different boundary conditions; are all well-documented [55]. Astonishment at the lack of our understanding of 
the molecular mechanism causing hydrophobicity, as has been expressed in a recent work [56], is therefore justified. 
However, it is neither in the scope nor in the motivation of the present work to reflect upon these opinions. We 
start our argument by noting down the pattern that the origin of hydrophobicity can be traced back to some kind 
of inter-atomic interactions, was never questioned by any of the proposed theories. These inter-atomic interactions 
are bound to cause certain fluctuations in the spatial coordinate of the atoms, is obvious too. The present work, 
attempts to derive the conditions by which information about the extent of fluctuations in spatial coordinate of a 
subset of atoms can be inferred from macroscopically measured parameters. For example, if one hypothesizes that 
origin of hydrophobicity can be studied from collective effects of dispersion forces in the interior and exterior of 
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the proteins, he (she, at any rate) can find out the precise number of inter-atomic interactions that account for 
the emergence of hydrophobicity. More interestingly, the starting point of his study will not involve any guesswork 
concerning the number of atoms; but will be some macroscopic measure of hydrophobicity (say, hydrophobic fractal 
dimension [19], or hydrophobic moment of a protein [57]) and the number of atoms giving rise to it, will be obtained 
as the result. Categorically, the user needs to assign the measured (computationally or experimentally) change 
of (cumulative) hydrophobicity content (as provided by the constructs proposed in [19] or [57]) due to any set of 
atoms, to the variable x ; the contact-map information for these atoms in the matrix A ; and the dependency of 
hydrophobicity on (quantum mechanically calculated) partial charges or any other parameter that the user considers 
necessary to describe dispersion related effects, in the (context-dependent) function / (x, t) , of the eq n — 4. Or 
alternatively, he can input the (time-dependent) coordinate information of the relevant set of atoms in the variable 
x and keeping the rest of the parameters invariant, attempt to measure the change of hydrophobicity content by 
scrutinizing the magnitude of x . Owing to the high degree of flexibility inherent in the algorithm, the effective- 
ness of the proposed scheme lies primarily with the discretion of the user; because that ensures the appropriate 
choice of parameters to be assigned to the function / (x, t) (as an example, instead of resorting to (subjective) 
choice of parameters that might provide an appropriate framework to study effect of dispersion related forces on 
hydrophobicity, if the user had used atomic hydrophobicity magnitudes [120] for every atom, the change in the 
content of hydrophobicity could have been measured differently) . Utility of the present scheme therefore lies in the 
fact that it can study unambiguously how exactly from the nano-scale (small number of distinguishable atoms) , the 
macroscopic property of thermodynamic nature (hydrophobicity) is emerging. Putting the proposition differently, 
the algorithm can predict precisely how much of the inter-atomic fluctuations will be observable with the known 
(macroscopic) index of hydrophobicity. Posing the same question in still different words, the present methodology 
can be used to ascertain the precise lower limit of the number of atoms which are necessary to observe emergence 
of the property named hydrophobicity. The mathematical construct presented here is general. It can be put to 
rigorous use by suitably considering only the pertinent aspect of it (according to the nature of requirement of any 
particular problem) where it might find potential utility. 

Such categorical information about the number and character of atom-cluster that produces hydrophobicity due 
to their inter-atomic interactions, becomes indispensable in order to probe recent questions regarding the nature 
of hydrophobicity. For example, a steady flow of opinions could be recorded over the last decade, which argued 
that hydrophobic effect is not necessarily an entropic phenomenon; it can be enthalpic or entropic depending on 
the temperature and the geometric characteristics of the solute[58-61]. The difficulty with attempting this problem 
stems primarily from the inherent contradiction, namely, geometrical descriptions (distinguishable object based) 
and thermodynamic (statistical) descriptions work at two different levels of systemic descriptions. While the former 
is primarily bottom-up (nano scaled) in its nature the later is top-down (macroscopic). The present theory provides 
a quantitative tool-set to examine the emerging properties in their nascent form in mesoscopic scale. 

1.2) Scope of applicability of the present algorithm : 

It has been found experimentally that the relationship between "bulk hydrophobic interaction", exposure of hy- 
drophobic residues from its pure phase to water and "pair hydrophobic interaction" potential of mean force (PMF) 
in water is nonlinear[62, 63]. Although various experimental studies from varying perspectives (studies related to 
virial coefficients, Kirkwood-Buff integrals, and on related spatially integrated quantities [62-65]), have studied the 
nature of hydrophobicity and many have attempted to focus purely on the multiple facets of PMA [66-68], the 
spatial dependence of PMF itself through to direct experimental mechanism is difficult to obtain. Here, to know 
the precise nature of spatial extent of PMF, rather than resorting to the simulation-centric studies, we can resort 
to the rigorous mathematical treatise presented here. By describing the fluctuating magnitude of temperature 
dependent spatial extent of PMF with the variable x , the (time-dependent) contact map with matrix A , and the 
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expression for dependency of exposure of hydrophobic residues from its pure phase to water on the (time-dependent 
and context-dependent) "pair hydrophobic interaction" PMF of residues in eg™ — 4, a consistent scheme can be 
constructed to solve for the spatial extent of PMF for the concerned set of residues. As has been mentioned before, 
the scheme constructed in this work is a general one and the effectiveness of its depends crucially on the judicious 
choice of parameters that are assigned to various terms in eg™ — 4 (or in eg™ — 1, if the description is linear). 

Applicability - 2) Case of polarizability : 

2.1) Scope of applicability of the present algorithm : 

A low polarizability in the interior of the protein implies a low magnitude of dielectric constant, which in turn im- 
plies a conducive environment for electrostatic interactions. Detailed account of electrostatic interactions of ionized 
side chains is necessary requirement for serious examination of protein stability because they are comparatively 
long-ranged ones amongst biophysical forces[69]. All pH-dependent properties of proteins are (predictably) gov- 
erned by the electrostatic interactions between ionizable side chains. Owing to this coupling to chemical protonation 
equilibria, protein electrostatics can be probed directly through measurements of pKa values [70-74]. The effect of 
electrostatic interactions is usually quantified in terms of the shift ApK a , of the pKa value of an ionizable group 
in a protein relative to the pKa values of the same group in a small reference molecule in dilute aqueous solution. 

Many aspects of protein pKa shifts are known to us. However, solution to the basic inverse problem [P-l], viz., 
given a particular magnitude of ApK a , for residues either in the surface or in the interior, what should be the 
minimum number of residues, which might produces it ? - is not easily obtainable in a general sense. Furthermore, 
despite immense efforts, computational and/or theoretical approaches that can reliably predict the large pKa shifts 
observed for buried residues in a general sense - remains difficult to find. While one aspect of these problems lie in 
the computational problems while considering ionization-induced water penetration and conformational changes in 
pKa calculations, the other aspect points to the lack of a theoretically sound scheme that can describe the emergence 
of a macroscopic property from its inception to the subsequent phases, as the number of residues that contribute 
to produce the property increases over time. Substituting the pKa values for every amino acid (in a sub-set of 
amino acids under consideration) in variable x, incorporating the contact-map information in the matrix A, and 
describing desolvation or conformational fluctuations (or any other parameter that the user thinks necessary) in 
the context-dependent (non-linear) function / in the eg™ — 4, one can attempt to obtain a quantitative magnitude 
for ApK a as the output x . The need for such a thorough scheme becomes even acute when one attempts to delve 
into the uniform dielectric continuum model of protein ineterior electrostatics. In such a model, the entire effect 
due to polarizability is described through a single dielectric constant (DC). (As a result, electrostatically highly 
heterogeneous [69] and anisotropic [75] protein interior is represented through an overtly simplified construct. The 
shortcomings of such model have been commented upon by many [69,72,76]. The magnitude of DC becomes a 
complex function of the extents to which formal charges, partial charges, and dipoles [72] are considered. The 
effective DC is calculated from the response of the entire protein to an externally applied electric field; which in 
its turn, is calculated from the total dipole moment fluctuation of the protein, through computational methods. 
However, it has been reported by many that the magnitude of dipole moment fluctuation is significantly affected 
by charged surface residues [77-79]. Hence, to what accuracy will such a construct be taking into account the 
self-energy of deeply buried ionizable residues, - remains unclear. Having said that, many a studies over the years 
have (successfully) addressed various aspects of the aforementioned complex issues. Rigorous computational exam- 
inations of the pKa shifts have been undertaken from the framework of macroscopic dielectric continuum models, 
via semi-macroscopic partial charge [79, 80], lattice dipole [72] models, and finally to all-atom simulations [74]. 
However, answer to another basic and general inverse problem (P-2), namely, electrostatic effects due to how many 
buried residues are being reflected in a measured magnitude of dipole moment fluctuation of the protein - could not 
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be found from the purview of the aforementioned studies. 

The commonality in P-l and P-2 is striking. Both of them are asking extremely basic questions. Computational 
constructs, regardless of how sophisticated they are, cannot provide the answers to them. Instead, a thorough 
mathematical model, which treats protein interior as a fluctuating, nonlinear (DC behaves in a nonlinear manner 
[81]) and time-dependent system, might help us in finding the answers to these basic (inverse problems). The 
mathematical model presented in the present work attempts to achieve precisely the same. 

2.2) Scope of applicability of the present algorithm : 

Importance of identifying the lower-threshold number of atoms to understand the origin and nature of biophysical 
forces, is multifaceted. We elaborate it in the context of studying the protein-protein interaction. This complex 
process is driven primarily by the hydrophobic effect and van der Waals interactions, with significant contribution 
from entropy and electrostatics. However, a general criteria to understand the precise extent of contribution of 
these, under varying biological contexts, is extremely difficult to evaluate. Examination of the roles of various 
components of electrostatic interactions during protein-protein interaction assumes an inherent difficulty because of 
their (non-linear) dependence on pH level and salt concentration [82]. To this end, the finding that salt dependence 
of the binding is not correlated with macroscopic parameters of the monomers [83,84], merely serves to underline the 
importance of categorical identification of the number of atoms necessary to account for the origin of 'macroscopic' 
properties. Furthermore, description of the process acquires a new level of difficulty when one notices the latest 
finding [76] that, homo-complexes and hetero-complexes adopt perfectly opposite scheme of electrostatics during 
their formation. (For homo-complexes, contrary to intuitive notions, in majority of the cases, the electrostatics 
opposes binding; whereas, for hetero-complexes, it is somewhat like the role salt bridges on protein stability; i.e., 
in some cases, it will favor the binding, while in some other, it will oppose the binding.) Although this apparent 
contradiction can be resolved to some extent by analyzing the charged-residue density in the interface for the two 
classes [76], construction of a general scheme to describe the electrostatics of protein-protein interaction. In the 
absence of a theoretically derived condition that unambiguously demarcate the origin of several components of 
electrostatic contributions in a generalized way, it has been reported that in certain cases electrostatic energy favors 
binding, while in some other cases, it opposes binding [76]. Furthermore, despite the consensus on significance of 
specific pair- wise electrostatic interactions across the interface [85-91], the conclusions about the role of electro- 
statics on binding affinity remain controversial. Having said that, we assert that this confusion over the precise 
extent of comparative contribution to various aspects of electrostatic forces on binding affinity can be resolved ob- 
jectively. Since all of the aforementioned properties have a macroscopic (at least mesoscopic) nature of their origin, 
an unambiguous scheme to describe the entire situation can only be found when we can evaluate the lower limit at 
which these properties come to being. Because it is in such pursuit that the evolution of each of these properties 
over time and under a nonlinear context dependence can be studied, which might ultimately provide us with the 
objective information regarding who is contributing how much and due to precisely how many number of atoms. 
The proposed scheme here achieves precisely the same through the differential eq n — 4, by assigning residue-specific 
atomic hydrophobicity values (or partial electrostatic charges, if one attempts to study electrostatic contributions) 
to the variable x , the contact-map information to the matrix A , and known dependencies that describes effect of 
hydrophobicity (or electrostatics) on binding free energy to function / (x, t) . 

2.3) Scope of applicability of the present algorithm : 

Another example with polarizability studies might help in registering the significance of present theory of biological 
observability. Poisson-Boltzmann (PB) theory is a statistical mean field theory that characterizes coarse-grained 
quantities such as the average particle distribution function and the electrostatic potential together with thermo- 
dynamic variables, in systems composed of many charged and point like particles at thermal equilibrium. However, 
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despite various modifications of the scheme in dilute and strong coupling regimes [92, 93] and numerous imaginative 
applications of it several paradigms of macromolecular biological structures [94-96] ; the statistical modeling of real 
solutions - often in an intermediate regime - is still an open problem [93,97]. It is an open problem because the 
precise threshold at which emergence of the statistical properties take place is not clear to us. Taking recourse to 
the present theory, we can attempt to achieve clarity in our description of the aforementioned problem. 

2.4) Scope of applicability of the present algorithm : 

Accurate measurement of residual dipolar couplings in weakly aligned proteins can in principle provide incisive 
information about the structure and dynamics of them in the solution state [98]. But the problem in this operation 
stems from the nature of measured information, which usually embody a convolution of the structural and dynamic 
properties [99]. Amongst many other aspects, the sensitivity of residual dipolar couplings to internal motions has 
been recognized by many as an enormously interesting question [100-103]. This is so because, unlike the conventional 
spin relaxation and chemical exchange-based studies, residual dipolar couplings are sensitive to motions spanning a 
wide range of time scales, and henceforth, they might be considered as potent probes to monitor biologically relevant 
motions[100]. But, many structure refinement protocols for analyzing dipolar data implicitly assume that internal 
motions are either absent, negligibly small, or uniform and axially symmetric in nature [104], [105]. Although some 
sporadic attempts have been made to study dynamics in protein interior, a rigorous theoretical framework that 
solves the inverse problem (P-3), namely, given the information regarding residual dipolar coupling, to what extent 
can we observe the dynamics of protein atoms, - has not been proposed. It isn't a question, because these motions 
are not universally reflected in specific dipolar couplings in presence of high anisotropy [99] , but a carefully designed 
mathematical framework with top-down philosophy that can encompass physical perturbations without delving into 
the bottom-up origins of the later, can attempt to describe the situation with adequate accuracy. The scope of this 
question can be meaningfully extended if we notice that interference of existing internal motions with structural 
interpretations of dipolar data - has not been studied with sufficient thoroughness. Study with particular case [99] 
tends to suggest that some internal motions can well be into the range of observability. Here, from the context of 
the present algorithm, describing the change in the magnitude of the (emergent) dipolar coupling as the dependent 
variable of eq n — 4, the contact-map information for neighboring atoms in the matrix A, and relevant information 
(either the residual pKa values, or some cumulative measure of atomic partial charges at residual level or any other 
parameter set that the user thinks pertinent) in / (x, t) , - one can attempt to observe at which threshold level of 
a number of atoms does the measured magnitude of the residual dipolar coupling emerge. 

Applicability - 3) Case of NMR : 

Biomolecular structure determination through NMR starts with collection of proton spectra. Protons in the various 
atoms in various residues have different electronic environments [106]. The typical electronic cloud dispositions give 
rise to typical local magnetic fields which alter the static field B to B (1 — <r) , altering therefore, the Larmor 
frequency of these protons. The resulting proton spectrum is comprised of many peaks. These shifts in the Larmor 
are characteristic of the chemical environment of the spins and are termed chemical shifts (CS). Proteins are known 
to be flexible [3], in solution, they undergo constant small conformational changes and furthermore since the CSs 
are affected by tertiary structures [107-109], we can regard the chemical shifts as dynamic (time- varying) . However, 
chemical shifts are typically viewed as a static property [110] (largely due to the tools employed in traditional NMR 
analysis. The NMR spectrometer records a series of time-domain signals, know as Free Induction Decays (FIDs). 
A given atom's CS is encoded as a periodicity within the FIDs. It is obtained by applying a Fourier Transform to 
the FIDs. FIDs, being time-domain signals, are capable of encoding CSs. However, it is not possible to observe CSs 
using the Fourier Transform because the integration operation takes place over time). Nevertheless, based on the 
CS information, it is possible to assign the various peaks in the spectrum of a molecule to various protons, a process 
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called frequency labeling [111]. In theory, using these shifts, the peaks in the spectrum can be uniquely assigned 
to protons in various amino acids. Following this, a series of experiments are used to selectively excite protons 
and study the effect on other protons. This effect (Nuclear Overhauser effect) is proportional to r' -6 ) where r is 
the distance between nuclei. The obtained set of data then provides the distance information between protons of 
various amino acids. Once this distance is known, one can solve for a folded configuration of the protein, which 
satisfied these distance constraints. The practical scenario, on the other hand, presents a different picture since the 
proton spectrum of a large protein molecule is obtained as one with poor resolution [52] (because existence of large 
number of protons leads to crowding of the spectra). Owing to this poor resolution, the task of frequency labeling 
becomes extremely difficult. 

With the advent of 2D NMR [112] (here the connectivity between distinct individual spins are delineated, and 
furthermore, the resonance peaks are spread out in two dimensions leading to a substantial improvement in peak 
separation, thus making the spectra far easier to interpret) the aforementioned problem can be tackled. But the 
basic inverse problem (P-4), still lingers; viz., what should be the upper limit of the number of residues (and atoms 
therein), so that given a proton spectrum, the frequency labeling can be achieved ? Appropriate use of the proposed 
scheme might help in more focused framing of (P-4), before an attempt to obtain the solution for it. 
However, we must mention here that control theoretic applications to NMR studies are not new [52] and many a 
successful constructs have been proposed banking on the pure mathematical studies with basic knowledge of Physics 
that involves little or no computational prowess. However, in terms of the scope and orientation, the approach pro- 
posed in this work is first of its kind. 

Applicability - 4) Case of Drug-Discovery and Computational-Chemistry : 

In the paradigm of drug discovery (and computational chemistry, in general) an outstanding problem can be stated 
as, given the information regarding the structure of a protein active site and a list of potential small molecule ligands, 
predict the binding mode and estimate the binding affinity for each ligand [113]. This problem has multiple aspects 
associated with it and has been a field of intense computational and experimental analysis over the last fifteen 
years. However, certain basic questions in this paradigm still remain unanswered and in the absence of theoretically 
deduced unambiguous criteria set, approaches to these questions often provide inconsistent results [114]. The entire 
operation of docking can be summarized into two operations; first, the operation of "posing"; viz., the appropriate 
positioning of the correct conformer of a ligand in the active site (combination of conformation and orientation being 
known as a "pose"). Second, the operation of "scoring", where poses are selected and ranked with respect to some 
scoring function [113-114]. Although apparently straightforward, this two step process involves many a complex 
(non-linear) physico-chemical interactions from geometric perspective and in their bid to simultaneously address 
these issues through this two-step process, many approximations and inadequate constructs are resorted to. These 
have been identified in many recent works [114-117]. (For example, simplistic treatment of electrostatics, electronic 
polarization, aqueous desolvation, and ionic influences; lack of accounting for entropy changes in the protein and the 
ligand on binding; inadequate weighting of proton positions (tautomers, rotamers) and charge states (ionization) of 
both protein and ligand; assumptions in many (but not all) of the cases that active site is rigid (possibly including 
tightly bound water molecules) and that only the small molecule can move; etc ..). A close scrutiny amongst these 
drawbacks points immediately to an underlying connecting factor. Many of these shortcomings exist because the 
precise mode of emergence of these features from a certain set of number of atoms, is not known; furthermore, the 
(non-linear) dependencies that these properties might be having on one another is difficult to decipher too, because 
of the same reason. A solution to these problems can be found from examining the situation from a coherent per- 
spective where the distinguishability of a non-statistical (non-macroscopic, non-thermodynamic) system of atoms 
can be ensured; but at the same time, conditions for observability of the emergence of macroscopic (statistical) 
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properties are appropriately identified. The control theoretic approach presented in this work, might help in quan- 
tifying many of these features, from the perspective of inverse problem; where the atomic origin of these features 
will be addressed from a bottom-up mathematical standpoint without delving into the depths of physical and/or 
chemical dependencies. For example, a computational framework can be set-up where the user assigns measured 
(computational or experimental) change of entropy of either protein or ligand to the variable x , the neighboring 
atom information of it in the form of a contact-map to the matrix A, the conformational fluctuation based informa- 
tion for each amino acids involved, to x ; and finally, some relevant parameter that maps residual conformational 
fluctuations with (macroscopic) entropy, to the nonlinear context-dependent function / (a;, t) . (Otherwise, the cu- 
mulative effect of conformational fluctuations, as the internal energy of the set of atoms under consideration, can be 
mapped to entropy and assigned to / (x, t) .) With such a scheme, for a known set of x values, the corresponding 
x ; or the more direct; known x to unknown x studies - can be attempted. - Obviously, an expert in this sphere of 
knowledge can ascertain the feasibility of attempting certain problems, the general mathematical framework stays 
valid for every set of parameter. 

On the other hand, the scoring functions serve as objective function to classify diverse poses of a single ligand 
in the receptor binding site before estimating the binding affinities of different receptor-ligand complexes (and 
ranking them) upon the docking of a compound database is performed [118]. Interestingly, the drawbacks of scor- 
ing functions, as elaborated in a recent work [119] (failure to accommodate subtle physical effects affecting the 
experimental binding energy; viz., the treatment of polar groups in the ligand or the protein being desolvated upon 
binding but failing to find a matching polar interaction in the complex, treatment of hydrophobic patches of the 
ligand exposed to the solvent upon binding, a more comprehensive treatment of loss of rotational and translational 
entropy; etc ..) - also suffer from the same nature of problems as the ones explained in that last paragraph. Here 
also, a consistent scheme that does not involve itself with the mind-boggling complexity of the physico-chemical 
interactions, but circumvents it by assuming that all the aforementioned properties come to being due to some or 
the other form of interatomic interactions between a set of atoms involving electromagnetic forces, before attempt- 
ing to identify the number of atoms necessary to produce the property under consideration - can be of extreme 
utility. Since the algorithm proposed here targets the transition zone between nano-scale individualistic properties 
to mesoscopic and subsequently macroscopic properties, by targeting the number of interacting atoms rather than 
the property itself; - it can overlook the complex physico-chemical details. Yet, it can monitor, from which threshold 
of atoms, the emergence of a particular property is observed. This helps him to identify the possible dependencies 
one property can have on the others and predict which ones are more fundamental than the others. 



4 Conclusion : 

An algorithm to study the lower threshold of emergence for various biophysical properties, is presented in this work. 
Categorical linkages between rigorous mathematical backbone with protein biophysical properties are established. 
An exact knowledge of these limits hold paramount utiliterian importance in the paradigm of the nascent field 
Nano-Bioscience. They, on the other hand, provide the contemporary state of protein interior knowledge with 
constructs to investigate the fundamental questions of protein biophysics. In near future, when the computational 
facilities become less prohibitive, these algorithms can be implemented to answer the questions like "precisely how 
many atoms are necessary for us to observe hydrophobicity in a protein under a specified biological context"? 
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